The Communication Performance of the Cray T3D and its Effect on Iterative Solvers
نویسندگان
چکیده
On many distributed memory systems, such as workstation clusters or the Intel iPSC/860, the multigrid algorithm suffers from having extensive communication requirements and, in general, it is not very competitive in comparison to the conjugate gradient algorithm. This is in contrast to the sequential problem whereby the multigrid algorithm is very effective in reducing the global residual, particularly for very large linear systems of equations. These two algorithms are now compared on the Cray T3D for solving very large systems of linear equations (resulted from grids of the order 2563 cells). The communication performance of the Cray T3D is first measured by the standard ping-pong tests and also by practical communication tasks that are found frequently in CPD calculations. It is found that the Cray T3D has a low latency (= 6 ps) and a high bandwidth interprocessor communication (120 MB/s) when the low-level intrinsic communication routines are used. As a result, the multigrid algorithm is found to be very competitive when compared with the conjugate gradient algorithm for solving the very large linear systems arising from the Direct Numerical Simulation of turbulent Combustion (DNSC). Results are contrasted by those on the Intel iPSC/%O.
منابع مشابه
MPP Solution of Rayleigh - Bénard - Marangoni Flows
A domain decomposition strategy and parallel gradient-type iterative solution scheme have been developed and implemented for computation of complex 3D viscous flow problems involving heat transfer and surface tension effects. Special attention has been paid to the kernels for the computationally intensive matrix-vector products and dot products, to memory management, and to overlapping communic...
متن کاملParallel Implementation of a 3-d Subband Decomposition Algorithm for Digital Image Sequence Compression on the Cray T3d
This paper presents an eecient massively parallel implementation on the CRAY T3D of a digital image sequence compression scheme based on a 3-D subband decomposition. This compression method has been selected to be implemented on the CRAY T3D for its high potential of parallelization, its high computational complexity and its scientiic interest. This implementation has been performed in C, using...
متن کاملParallel Iterative Solvers and Preconditioners Using Approximate Hierarchical Methods ( An Extended
In this paper, we report results of the performance, convergence, and accuracy of a parallel GMRES solver for Boundary Element Methods. The solver uses a hierarchical approximate matrix-vector product based on a hybrid Barnes-Hut / Fast Multipole Method. We study the impact of various accuracy parameters on the convergence and show that with minimal loss in accuracy, our solver yields significa...
متن کاملCellFlow: A Parallel Rendering Scheme for Distributed Memory Architectures
CellFlow is an animation system that exploits frame coherency to implement a lookahead scheme of object dataflow. The implementation of this scheme uses the communication features of modern scalable multicomputers to achieve good speedup by means of latency hiding. We demonstrate the performance of our approach in the field of volume rendering by implementing incremental rotation of the volumet...
متن کاملA Cray T3D Performance Study
We carry out a performance study using the Cray T3D parallel supercomputer to illustrate some important features of this machine. Timing experiments show the speed of various basic operations while more complicated operations give some measure of its parallel performance.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Parallel Computing
دوره 22 شماره
صفحات -
تاریخ انتشار 1996